29 research outputs found

    Listening to features

    Get PDF
    This work explores nonparametric methods which aim at synthesizing audio from low-dimensionnal acoustic features typically used in MIR frameworks. Several issues prevent this task to be straightforwardly achieved. Such features are designed for analysis and not for synthesis, thus favoring high-level description over easily inverted acoustic representation. Whereas some previous studies already considered the problem of synthesizing audio from features such as Mel-Frequency Cepstral Coefficients, they mainly relied on the explicit formula used to compute those features in order to inverse them. Here, we instead adopt a simple blind approach, where arbitrary sets of features can be used during synthesis and where reconstruction is exemplar-based. After testing the approach on a speech synthesis from well known features problem, we apply it to the more complex task of inverting songs from the Million Song Dataset. What makes this task harder is twofold. First, that features are irregularly spaced in the temporal domain according to an onset-based segmentation. Second the exact method used to compute these features is unknown, although the features for new audio can be computed using their API as a black-box. In this paper, we detail these difficulties and present a framework to nonetheless attempting such synthesis by concatenating audio samples from a training dataset, whose features have been computed beforehand. Samples are selected at the segment level, in the feature space with a simple nearest neighbor search. Additionnal constraints can then be defined to enhance the synthesis pertinence. Preliminary experiments are presented using RWC and GTZAN audio datasets to synthesize tracks from the Million Song Dataset.Comment: Technical Repor

    Hydrogen emissions from Erebus volcano, Antarctica

    Get PDF
    International audienceThe continuous measurement of molecular hydrogen (H2) emissions from passively degassing volcanoes has recently been made possible using a new generation of low-cost electrochemical sensors. We have used such sensors to measure H2, along with SO2, H2O and CO2, in the gas and aerosol plume emitted from the phonolite lava lake at Erebus volcano, Antarctica. The measurements were made at the crater rim between December 2010 and January 2011. Combined with measurements of the long-term SO2 emission rate for Erebus, they indicate a characteristic H2 flux of 0.03 kg s-1 (2.8 Mg day-1). The observed H2 content in the plume is consistent with previous estimates of redox conditions in the lava lake inferred from mineral compositions and the observed CO2/CO ratio in the gas plume (∼0.9 log units below the quartz-fayalite-magnetite buffer). These measurements suggest that H2 does not combust at the surface of the lake, and that H2 is kinetically inert in the gas/aerosol plume, retaining the signature of the high-temperature chemical equilibrium reached in the lava lake. We also observe a cyclical variation in the H2/SO2 ratio with a period of ∼10 min. These cycles correspond to oscillatory patterns of surface motion of the lava lake that have been interpreted as signs of a pulsatory magma supply at the top of the magmatic conduit

    Audio Signal Representations for Factorization in the sparse domain

    Get PDF
    International audienceIn this paper, a new class of audio representations is introduced, together with a corresponding fast decomposition algorithm. The main feature of these representations is that they are both sparse and approximately shift-invariant, which allows similarity search in a sparse domain. The common sparse support of detected similar patterns is then used to factorize their representations. The potential of this method for simultaneous structural analysis and compressing tasks is illustrated by preliminary experiments on simple musical data

    Matching Pursuits with Random Sequential Subdictionaries

    Get PDF
    Matching pursuits are a class of greedy algorithms commonly used in signal processing, for solving the sparse approximation problem. They rely on an atom selection step that requires the calculation of numerous projections, which can be computationally costly for large dictionaries and burdens their competitiveness in coding applications. We propose using a non adaptive random sequence of subdictionaries in the decomposition process, thus parsing a large dictionary in a probabilistic fashion with no additional projection cost nor parameter estimation. A theoretical modeling based on order statistics is provided, along with experimental evidence showing that the novel algorithm can be efficiently used on sparse approximation problems. An application to audio signal compression with multiscale time-frequency dictionaries is presented, along with a discussion of the complexity and practical implementations.Comment: 20 pages - accepted 2nd April 2012 at Elsevier Signal Processin

    Représentations redondantes et hiérarchiques pour l'archivage et la compression de scènes sonores

    Get PDF
    L'objet de cette thèse est l'analyse et le traitement automatique de grands volumes de données audio. Plus particulièrement, on s'intéresse à l'archivage, tâche qui regroupe, au moins, deux problématiques: la compression des données, et l'indexation du contenu de celles-ci. Ces deux problématiques définissent chacune des objectifs, parfois concurrents, dont la prise en compte simultanée s'avère donc difficile. Au centre de cette thèse, il y a donc la volonté de construire un cadre cohérent à la fois pour la compression et pour l'indexation d'archives sonores. Les représentations parcimonieuses de signaux dans des dictionnaires redondants ont récemment montré leur capacité à remplir une telle fonction. Leurs propriétés ainsi que les méthodes et algorithmes permettant de les obtenir sont donc étudiés dans une première partie de cette thèse. Le cadre applicatif relativement contraignant (volume des données) va nous amener à choisir parmi ces derniers des algorithmes itératifs, appelés également gloutons. Une première contribution de cette thèse consiste en la proposition de variantes du célèbre Matching Pursuit basées sur un sous-échantillonnage aléatoire et dynamique de dictionnaires. L'adaptation au cas de dictionnaires temps-fréquence structurés (union de bases de cosinus locaux) nous permet d'espérer une amélioration significative des performances en compression de scènes sonores. Ces nouveaux algorithmes s'accompagnent d'une modélisation statistique originale des propriétés de convergence usant d'outils empruntés à la théorie des valeurs extrêmes. Les autres contributions de cette thèse s'attaquent au second membre du problème d'archivage: l'indexation. Le même cadre est cette fois-ci envisagé pour mettre à jour les différents niveaux de structuration des données. Au premier plan, la détection de redondances et répétitions. A grande échelle, un système robuste de détection de motifs récurrents dans un flux radiophonique par comparaison d'empreintes est proposé. Ses performances comparatives sur une campagne d'évaluation du projet QUAERO confirment la pertinence de cette approche. L'exploitation des structures pour un contexte autre que la compression est également envisagé. Nous proposons en particulier une application à la séparation de sources informée par la redondance pour illustrer la variété de traitements que le cadre choisi autorise. La synthèse des différents éléments permet alors d'envisager un système d'archivage répondant aux contraintes par la hiérarchisation des objectifs et des traitements.The main goal of this work is automated processing of large volumes of audio data. Most specifically, one is interested in archiving, a process that encompass at least two distinct problems: data compression and data indexing. Jointly addressing these problems is a difficult task since many of their objectives may be concurrent. Therefore, building a consistent framework for audio archival is the matter of this thesis. Sparse representations of signals in redundant dictionaries have recently been found of interest for many sub-problems of the archival task. Sparsity is a desirable property both for compression and for indexing. Methods and algorithms to build such representations are the first topic of this thesis. Given the dimensionality of the considered data, greedy algorithms will be particularly studied. A first contribution of this thesis is the proposal of a variant of the famous Matching Pursuit algorithm, that exploits randomness and sub-sampling of very large time frequency dictionaries. We show that audio compression (especially at low bit-rate) can be improved using this method. This new algorithms comes with an original modeling of asymptotic pursuit behaviors, using order statistics and tools from extreme values theory. Other contributions deal with the second member of the archival problem: indexing. The same framework is used and applied to different layers of signal structures. First, redundancies and musical repetition detection is addressed. At larger scale, we investigate audio fingerprinting schemes and apply it to radio broadcast on-line segmentation. Performances have been evaluated during an international campaign within the QUAERO project. Finally, the same framework is used to perform source separation informed by the redundancy. All these elements validate the proposed framework for the audio archiving task. The layered structures of audio data are accessed hierarchically by greedy decomposition algorithms and allow processing the different objectives of archival at different steps, thus addressing them within the same framework.PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF

    Explainability in Music Recommender Systems

    Full text link
    The most common way to listen to recorded music nowadays is via streaming platforms which provide access to tens of millions of tracks. To assist users in effectively browsing these large catalogs, the integration of Music Recommender Systems (MRSs) has become essential. Current real-world MRSs are often quite complex and optimized for recommendation accuracy. They combine several building blocks based on collaborative filtering and content-based recommendation. This complexity can hinder the ability to explain recommendations to end users, which is particularly important for recommendations perceived as unexpected or inappropriate. While pure recommendation performance often correlates with user satisfaction, explainability has a positive impact on other factors such as trust and forgiveness, which are ultimately essential to maintain user loyalty. In this article, we discuss how explainability can be addressed in the context of MRSs. We provide perspectives on how explainability could improve music recommendation algorithms and enhance user experience. First, we review common dimensions and goals of recommenders' explainability and in general of eXplainable Artificial Intelligence (XAI), and elaborate on the extent to which these apply -- or need to be adapted -- to the specific characteristics of music consumption and recommendation. Then, we show how explainability components can be integrated within a MRS and in what form explanations can be provided. Since the evaluation of explanation quality is decoupled from pure accuracy-based evaluation criteria, we also discuss requirements and strategies for evaluating explanations of music recommendations. Finally, we describe the current challenges for introducing explainability within a large-scale industrial music recommender system and provide research perspectives.Comment: To appear in AI Magazine, Special Topic on Recommender Systems 202

    WASABI: a Two Million Song Database Project with Audio and Cultural Metadata plus WebAudio enhanced Client Applications

    Get PDF
    This paper presents the WASABI project, started in 2017, which aims at (1) the construction of a 2 million song knowledge base that combines metadata collected from music databases on the Web, metadata resulting from the analysis of song lyrics, and metadata resulting from the audio analysis, and (2) the development of semantic applications with high added value to exploit this semantic database. A preliminary version of the WASABI database is already online1 and will be enriched all along the project. The main originality of this project is the collaboration between the algorithms that will extract semantic metadata from the web and from song lyrics with the algorithms that will work on the audio. The following WebAudio enhanced applications will be associated with each song in the database: an online mixing table, guitar amp simulations with a virtual pedal-board, audio analysis visualization tools, annotation tools, a similarity search tool that works by uploading audio extracts or playing some melody using a MIDI device are planned as companions for the WASABI database

    QCD and strongly coupled gauge theories : challenges and perspectives

    Get PDF
    We highlight the progress, current status, and open challenges of QCD-driven physics, in theory and in experiment. We discuss how the strong interaction is intimately connected to a broad sweep of physical problems, in settings ranging from astrophysics and cosmology to strongly coupled, complex systems in particle and condensed-matter physics, as well as to searches for physics beyond the Standard Model. We also discuss how success in describing the strong interaction impacts other fields, and, in turn, how such subjects can impact studies of the strong interaction. In the course of the work we offer a perspective on the many research streams which flow into and out of QCD, as well as a vision for future developments.Peer reviewe
    corecore